Our presentation video link: https://www.dropbox.com/s/e04cl4d76pjm0rb/5206Presentation.mov?dl=0
Labor force has always been what economists stress the importance of due to the correlation between labor force and economic growth. This is demonstrated in the Solow-Swan model, which is known as a non-classical growth model (exogenous model). \[Y_t = K_t^\alpha(A_tL_t)^{1-\alpha}\] where \(t\) denotes time, \(0 < \alpha < 1\) is the elasticity of output with respect to capital, and \(Y_t\) represents total production. \(A\) refers to labor-augmenting technology or “knowledge”, thus \(AL\) represents effective labor.
Robert Solow and Trevor Swan (1956) tries to explain the economic growth by the capital, labor, and knowledge in the model, assuming that the technology level is exogenous and the same among countries. They conclude that the differences in long-run GDP growth rates per capita across countries represents the difference in capital accumulation, labor force and population.
Also, it is the fertility rate that represents the speed of labor force generation. Therefore, the accumulation of labor force is closely related to fertility rate. However, in recent decades, the fertility rate has declined in many countries. Some developed countries’ fertility rates have fallen below the replacement rates like the United States, Korea, Japan and so on.
Fertility Rate All Over the world
This means that such countries face a lot of problems brought by the low fertility rate like the aging population, the growth of the economy. Therefore, we are interested in what is related to the fertility rate and make a recommendation to alleviate the problem that brought by declining fertility. In addition, the price of houses has increased dramatically over the past few decades especially in many countries. It is easy to see the upward trend from the global real house price index.
Housing Price Index
Some people argue that it is because of the increasing house price that lower the fertility rate. However, the plot below suggests a positive relationship. We would like to find out whether high housing price decreases fertility rate from a country-level dataset.
Fertility Rate v.s. Housing Price Index
| Variable | Description | Source |
|---|---|---|
| fer | Fertility rate, total (births per woman) | World Bank |
| gdp | GDP | World Bank |
| lab | Labor force participation rate, female (% of female population ages 15+) | World Bank |
| edu | School enrollment, secondary, female (% gross) | World Bank |
| cpi | Consumer Price Index | World Bank |
| unemp | Unemployment Rate | World Bank |
| pop | Population | World Bank |
| hou | Housing Price Index | OECD |
| ten | Housing Tenure | UN Data, American Housing Survey |
| sav | Household Savings, % of household disposable income | OECD Data |
| asset | Household Financial Assets, US dollars/capita | OECD Data |
In order to analyze the potential causal effect between fertility rate and housing price, we try to handle the confounding factors of them. Here we combined two methods to achieve it. Firstly, fixed/random effect regression based on panel data can reduces the time-invariant confounding factors. Secondly, inspired by the study of Cevat Giray Aksoy (2016), we know that a main kind of confounders between fertility and housing price is personal/household wealth. So we try to involve some typical variables representing household wealth, like household financial assets and household savings. We would start with a ordinary linear regression.
Import some packages and load data into our environment!
library(tidyverse)
library(readxl)
library(VIM)
library(imputeTS)
library(broom)
library(knitr)
library(olsrr)
library(MASS)
library(psych)
library(jtools)
library(boot)
library(plm)
fer <- read_csv("fertility.csv")
gdp <- read_excel("gdp.xls")
hou <- read_csv("house_price.csv")
lab <- read_csv("female_labor_force_participation.csv")
edu <- read_csv("school_enrollment_secondary.csv")
ten <- read_csv("house_tenure.csv")
cpi <- read_csv("cpi.csv")
unemp <- read_csv("unemployment.csv")
pop <- read_csv("population.csv")
asset <- read_csv("financial_asset.csv")
sav <- read_csv("household_saving.csv")
Do some data cleaning and combination!
fer <- fer %>%
dplyr::select(-names(fer)[2:4]) %>%
rename(country = `Country Name`)
fer <- fer %>%
pivot_longer(names(fer)[-1], names_to = "year", values_to = "fer")
fer$year <- as.numeric(fer$year)
gdp <- gdp %>%
dplyr::select(-names(gdp)[2:4]) %>%
rename(country = `Country Name`)
gdp <- gdp %>%
pivot_longer(names(gdp)[-1], names_to = "year", values_to = "gdp")
gdp$year <- as.numeric(gdp$year)
hou <- hou %>%
dplyr::select(Country, Time, Value) %>%
filter(str_length(Time) == 4) %>%
rename(country = Country, year = Time, hou = Value) %>%
group_by(country, year) %>%
summarize(hou = mean(hou))
hou$year <- as.numeric(hou$year)
lab <- lab %>%
dplyr::select(-names(lab)[2:4]) %>%
rename(country = `Country Name`)
lab <- lab %>%
pivot_longer(names(lab)[-1], names_to = "year", values_to = "lab")
lab$year <- as.numeric(lab$year)
edu <- edu %>%
dplyr::select(-names(edu)[2:4]) %>%
rename(country = `Country Name`)
edu <- edu %>%
pivot_longer(names(edu)[-1], names_to = "year", values_to = "edu")
edu$year <- as.numeric(edu$year)
ten <- ten %>%
filter(Area == "Total" & `Type of housing unit` == "Total") %>%
rename(country = `Country or Area`, year = Year, tenure = Tenure, value = Value) %>%
dplyr::select(country, year, tenure, value) %>%
pivot_wider(names_from = tenure, values_from = value) %>%
mutate(ten = `Member of household owns the housing unit` / Total) %>%
dplyr::select(country, year, ten) %>%
drop_na()
# add some data manually, which comes from new and small dataset/websites
ten <- ten %>%
add_row(country = "United States", year = 2019, ten = 79475/124135) %>%
add_row(country = "United States", year = 2017, ten = 77567/121560) %>%
add_row(country = "United States", year = 2015, ten = 74299/118290) %>%
add_row(country = "United States", year = 2013, ten = 75650/115852) %>%
add_row(country = "United States", year = 2011, ten = 76053/114833)
ten <- ten %>%
group_by(country) %>%
summarize(ten = mean(ten))
cpi <- cpi %>%
dplyr::select(-names(cpi)[2:4]) %>%
rename(country = `Country Name`)
cpi <- cpi %>%
pivot_longer(names(cpi)[-1], names_to = "year", values_to = "cpi")
cpi$year <- as.numeric(cpi$year)
unemp <- unemp %>%
dplyr::select(-names(unemp)[2:4]) %>%
rename(country = `Country Name`)
unemp <- unemp %>%
pivot_longer(names(unemp)[-1], names_to = "year", values_to = "unemp")
unemp$year <- as.numeric(unemp$year)
pop <- pop %>%
dplyr::select(-names(pop)[2:4]) %>%
rename(country = `Country Name`)
pop <- pop %>%
pivot_longer(names(pop)[-1], names_to = "year", values_to = "pop")
pop$year <- as.numeric(pop$year)
fer_temp <- read_csv("fertility.csv")
fer_temp <- fer_temp %>%
dplyr::select(`Country Name`, `Country Code`)
asset <- asset %>%
filter(SUBJECT == "TOT") %>%
rename(`Country Code` = LOCATION) %>%
left_join(fer_temp) %>%
rename(country = `Country Name`, year = TIME, asset = Value) %>%
dplyr::select(country, year, asset)
sav <- sav %>%
rename(`Country Code` = LOCATION) %>%
left_join(fer_temp) %>%
rename(country = `Country Name`, year = TIME, sav = Value) %>%
dplyr::select(country, year, sav)
data_joined <- fer %>%
inner_join(gdp) %>%
inner_join(hou) %>%
inner_join(lab) %>%
inner_join(edu) %>%
inner_join(cpi) %>%
inner_join(unemp) %>%
inner_join(pop) %>%
inner_join(asset) %>%
inner_join(sav) %>%
left_join(ten) %>%
drop_na()
data <- data_joined %>%
mutate(gdp_per_log = log(gdp / pop), hou = hou/100, lab = lab/100, edu = edu/100, unemp = unemp/100, pop = log(pop), cpi = cpi/100, sav = sav / 100, asset = log(asset)) %>%
dplyr::select(-gdp)
data_for_linear <- data %>%
dplyr::select(-country, -year)
Before setting up our model, we rescaled some variables. We generate the GDP per capita using GDP/population because GDP per capita is one of the comparative indicators of economic performance, which can help us compare the individual’s living standard in different countries. Then we take the log of the GDP per capita to make the range of GDP per capita smaller. The population is also logged. Also, to make all the predictors comparable to each other, we divide all the predictors by 100, since all the predictors previously all multiplied by 100 to remove the percentage sign.
pairs.panels(data_for_linear)
From the scatter plot, we’d expected that housing price has a negative relationship with fertility rate.
Let’s look at the full linear regression model first.
full <- lm(fer ~ ., data = data_for_linear)
summary(full)
##
## Call:
## lm(formula = fer ~ ., data = data_for_linear)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.52421 -0.19222 -0.01779 0.15653 0.75660
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.25283 0.30196 0.837 0.402854
## hou 0.02194 0.06043 0.363 0.716705
## lab 0.96154 0.19617 4.901 1.33e-06 ***
## edu 0.26407 0.06852 3.854 0.000133 ***
## cpi -0.82767 0.56988 -1.452 0.147091
## unemp -2.50697 0.32344 -7.751 6.04e-14 ***
## pop 0.06749 0.00889 7.591 1.82e-13 ***
## asset -0.14102 0.03304 -4.268 2.41e-05 ***
## sav 0.22818 0.23152 0.986 0.324856
## ten 0.13802 0.11772 1.172 0.241621
## gdp_per_log 0.10899 0.03579 3.045 0.002463 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2373 on 454 degrees of freedom
## Multiple R-squared: 0.3131, Adjusted R-squared: 0.298
## F-statistic: 20.69 on 10 and 454 DF, p-value: < 2.2e-16
Then let’s do some simple assumptions test for the ordinary linear regresison model.
What can go wrong?
Our regression model require some assumptions:
Residuals should:
be normally distributed.
be independent.
have the same variance.
Basic idea of diagnostic measures: if model is correct then residuals \(e_i = Y_i - \hat{Y_i}, 1\leq i \leq n\) should look like a sample of (not quite independent) \(N(0,\sigma^2)\) random variables.
Therefore, we are going to check all the assumptions.
par(mfrow = c(2,2))
plot(full, pch = 23, bg = 'orange', cex = 1)
From the plots above, the error follows a normal distribution and have constant variance. From Cook’s distance, there are obviously some outliers.
Since this a ordinary multiple regression without considering fixed/random effect, it may have many problems.
Fixed/random effect regression model can assist in controlling for omitted variable bias due to unobserved heterogeneity when this heterogeneity is constant over time, which means it can help us reduce the time-invariant confounding factors.
fixed = plm(fer ~ hou + lab + edu + cpi + unemp + pop + asset + sav + ten + gdp_per_log, data = data, index = c("country", "year"), model = "within")
summary(fixed)
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = fer ~ hou + lab + edu + cpi + unemp + pop + asset +
## sav + ten + gdp_per_log, data = data, model = "within", index = c("country",
## "year"))
##
## Unbalanced Panel: n = 24, T = 4-25, N = 465
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -0.29111671 -0.04162546 -0.00015286 0.04563224 0.19298887
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## hou 0.191196 0.034262 5.5803 4.245e-08 ***
## lab 1.303811 0.207217 6.2920 7.697e-10 ***
## edu -0.235057 0.059760 -3.9333 9.758e-05 ***
## cpi -0.328527 0.208356 -1.5768 0.115584
## unemp -0.050466 0.165056 -0.3057 0.759943
## pop -1.289671 0.080468 -16.0272 < 2.2e-16 ***
## asset -0.042504 0.022756 -1.8679 0.062457 .
## sav -0.226603 0.108043 -2.0973 0.036544 *
## gdp_per_log 0.085504 0.025688 3.3285 0.000948 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 5.3898
## Residual Sum of Squares: 2.6349
## R-Squared: 0.51113
## Adj. R-Squared: 0.47492
## F-statistic: 50.1859 on 9 and 432 DF, p-value: < 2.22e-16
random = plm(fer ~ hou + lab + edu + cpi + unemp + pop + asset + sav + ten + gdp_per_log, data = data, index = c("country", "year"), model = "random")
summary(random)
## Oneway (individual) effect Random Effect Model
## (Swamy-Arora's transformation)
##
## Call:
## plm(formula = fer ~ hou + lab + edu + cpi + unemp + pop + asset +
## sav + ten + gdp_per_log, data = data, model = "random", index = c("country",
## "year"))
##
## Unbalanced Panel: n = 24, T = 4-25, N = 465
##
## Effects:
## var std.dev share
## idiosyncratic 0.006099 0.078098 0.076
## individual 0.074098 0.272210 0.924
## theta:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.8580 0.9343 0.9403 0.9360 0.9415 0.9427
##
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.32294 -0.05783 -0.00055 -0.00105 0.06544 0.27091
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## (Intercept) 4.994666 0.896025 5.5742 2.486e-08 ***
## hou 0.123205 0.041081 2.9991 0.0027080 **
## lab 0.331086 0.234509 1.4118 0.1580002
## edu -0.227214 0.072371 -3.1396 0.0016920 **
## cpi -0.890455 0.251054 -3.5469 0.0003898 ***
## unemp -0.541037 0.197443 -2.7402 0.0061399 **
## pop -0.182482 0.041803 -4.3653 1.270e-05 ***
## asset -0.116410 0.027057 -4.3024 1.690e-05 ***
## sav -0.033421 0.131063 -0.2550 0.7987247
## ten -0.923720 0.652696 -1.4152 0.1569987
## gdp_per_log 0.152177 0.030650 4.9650 6.871e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 5.7679
## Residual Sum of Squares: 4.166
## R-Squared: 0.27813
## Adj. R-Squared: 0.26223
## Chisq: 148.777 on 10 DF, p-value: < 2.22e-16
Which is better, Fixed effect or random effect regression? Let’s do a Hausman Test (set significance level as 0.05)
phtest(fixed, random)
##
## Hausman Test
##
## data: fer ~ hou + lab + edu + cpi + unemp + pop + asset + sav + ten + ...
## chisq = 258.08, df = 9, p-value < 2.2e-16
## alternative hypothesis: one model is inconsistent
Since p-value < 0.05, we choose fixed effect model.
The coefficient of housing price is 0.19, which means, one unit increase in housing index would increase the fertility by 0.19. It is strongly against the opinion that a high housing price would decrease the fertility rate. In fact, according to some research, houses, as a kind of financial asset, has an investmental value. An increasing housing price can increase some persons’ wealth who already own houses, therefore strengthen their willing to have more children. Even for those who doesn’t own houses, expectation of continuous increasing of housing price may have the same effect.
Also, problems arises. If a country’s housing price is already high enough, does the increasing in housing price still increases fertility rate? We may talk it through in the following parameter uncertainty part.
We divided the data into two groups by the median of house price to see how the coefficients change within different groups.
data_low_hou <- data %>%
filter(hou <= median(data$hou))
data_high_hou <- data %>%
filter(hou > median(data$hou))
model_low_hou <- plm(fer ~ hou + lab + edu + cpi + unemp + pop + asset + sav + ten + gdp_per_log, data = data_low_hou, index = c("country", "year"), model = "within")
model_high_hou <- plm(fer ~ hou + lab + edu + cpi + unemp + pop + asset + sav + ten + gdp_per_log, data = data_high_hou, index = c("country", "year"), model = "within")
export_summs(fixed, model_low_hou, model_high_hou, model.names = c("Full Data", "Low Housing Price Group", "High Housing Price Group"))
| Full Data | Low Housing Price Group | High Housing Price Group | |
|---|---|---|---|
| hou | 0.19 *** | 0.23 *** | 0.01 |
| (0.03) | (0.06) | (0.07) | |
| lab | 1.30 *** | 0.02 | 3.09 *** |
| (0.21) | (0.26) | (0.40) | |
| edu | -0.24 *** | -0.15 | -0.14 |
| (0.06) | (0.09) | (0.07) | |
| cpi | -0.33 | 0.52 ** | -0.50 |
| (0.21) | (0.20) | (0.32) | |
| unemp | -0.05 | -0.16 | -0.68 ** |
| (0.17) | (0.18) | (0.26) | |
| pop | -1.29 *** | -1.15 *** | -1.36 *** |
| (0.08) | (0.11) | (0.17) | |
| asset | -0.04 | -0.02 | -0.12 *** |
| (0.02) | (0.03) | (0.03) | |
| sav | -0.23 * | -0.14 | 0.38 * |
| (0.11) | (0.11) | (0.16) | |
| gdp_per_log | 0.09 *** | 0.09 *** | 0.03 |
| (0.03) | (0.02) | (0.04) | |
| nobs | 465 | 233 | 232 |
| r.squared | 0.51 | 0.56 | 0.53 |
| adj.r.squared | 0.47 | 0.49 | 0.45 |
| statistic | 50.19 | 28.34 | 24.77 |
| p.value | 0.00 | 0.00 | 0.00 |
| deviance | 2.63 | 0.49 | 0.88 |
| df.residual | 432.00 | 201.00 | 199.00 |
| nobs.1 | 465.00 | 233.00 | 232.00 |
| *** p < 0.001; ** p < 0.01; * p < 0.05. | |||
plot_summs(fixed, model_low_hou, model_high_hou, scale = TRUE, robust = TRUE, inner_ci_level = 0.9, model.names = c("Full Data", "Low Housing Price Group", "High Housing Price Group"), coefs = "hou")
From the table and plot, we can see that in the low housing price group, the impact of housing price is much bigger while it is close to 0 in high housing price group. It is shown that when housing price is already high, it wouldn’t stimulate people to have more children. The incentives almost disappear.
We divided the data into two groups by the median of household financial assets.
data_poor <- data %>%
filter(asset <= median(data$asset))
data_rich <- data %>%
filter(asset > median(data$asset))
model_poor <- plm(fer ~ hou + lab + edu + cpi + unemp + pop + asset + sav + ten + gdp_per_log, data = data_poor, index = c("country", "year"), model = "within")
model_rich <- plm(fer ~ hou + lab + edu + cpi + unemp + pop + asset + sav + ten + gdp_per_log, data = data_rich, index = c("country", "year"), model = "within")
export_summs(fixed, model_poor, model_rich, model.names = c("Full Data", "Less Asset Group", "More Asset Group"))
| Full Data | Less Asset Group | More Asset Group | |
|---|---|---|---|
| hou | 0.19 *** | 0.11 * | 0.19 *** |
| (0.03) | (0.05) | (0.06) | |
| lab | 1.30 *** | -0.07 | 1.15 *** |
| (0.21) | (0.41) | (0.31) | |
| edu | -0.24 *** | -0.14 | -0.14 |
| (0.06) | (0.10) | (0.09) | |
| cpi | -0.33 | -0.10 | -0.44 |
| (0.21) | (0.23) | (0.48) | |
| unemp | -0.05 | -0.43 * | 0.77 * |
| (0.17) | (0.22) | (0.30) | |
| pop | -1.29 *** | -1.46 *** | -2.00 *** |
| (0.08) | (0.15) | (0.26) | |
| asset | -0.04 | 0.03 | 0.01 |
| (0.02) | (0.03) | (0.05) | |
| sav | -0.23 * | -0.27 * | -0.25 |
| (0.11) | (0.13) | (0.27) | |
| gdp_per_log | 0.09 *** | 0.06 | 0.20 *** |
| (0.03) | (0.03) | (0.04) | |
| nobs | 465 | 233 | 232 |
| r.squared | 0.51 | 0.60 | 0.46 |
| adj.r.squared | 0.47 | 0.55 | 0.40 |
| statistic | 50.19 | 34.49 | 19.88 |
| p.value | 0.00 | 0.00 | 0.00 |
| deviance | 2.63 | 1.14 | 1.03 |
| df.residual | 432.00 | 203.00 | 208.00 |
| nobs.1 | 465.00 | 233.00 | 232.00 |
| *** p < 0.001; ** p < 0.01; * p < 0.05. | |||
plot_summs(fixed, model_poor, model_rich, scale = TRUE, robust = TRUE, inner_ci_level = 0.9, model.names = c("Full Data", "Less Asset Group", "More Asset Group"), coefs = "hou")
From these two groups’ results, we find those who already owns financial assets may be more stimulated. It is consistent with the guess above.
Missing confounders of fertility rate and housing price.
We may be missing terms. E.g. interaction terms, higher-order non-linear terms, polynomial terms.
All in all, as we demonstrated in our fixed effect regression model (even the ordinary linear regression model), we can conclude that even though the house price indeed has an impact on the fertility rate, high house prices do not lower the fertility rate as many people believe. Therefore, we recommend that policymakers should not try to lower the house price in order to stimulate people to have more children. Even though there are still unsolved limitations in this study like we illustrated previously, which needs future research, this study is to inform policy makers about the relationship between house prices and the fertility rate, so that the policy makers have a better understanding of this and make right decisions.